Microphone-Independent Robust Signal Processing Using Probabilistic Optimum Filtering

نویسندگان

  • Leonardo Neumeyer
  • Mitch Weintraub
چکیده

A new mapping algorithm for speech recognition relates the features of simultaneous recordings of clean and noisy speech. The model is a piecewise nonfinear transformation appfied to the noisy speech feature. The transformation is a set of multidimensional linear least-squares filters whose outputs are combined using a conditional Gaussian model. The algorithm was tested using SRI's DECIPHER TM speech recognition system [1-5]. Experimental results show how the mapping is used to reduce recognition errors when the training and testing acoustic environments do not match. 1. I N T R O D U C T I O N In many practical situations an automatic speech recognizer has to operate in several different but well-defined acoustic environments. For example, the same recognition task may be implemented using different microphones or transmission channels. In this situation it may not be practical to recollect a speech corpus to train the acoustic models of the recognizer. To alleviate this problem, we propose an algorithm that maps speech features between two acoustic spaces. The models of the mapping algorithm are trained using a small database recorded simultaneously in both environments. In the case of steady-state additive homogenous noise, we can derive a MMSE estimate of the clean speech filterbank-log energy features using a model for how the features change in the presence of this noise [6-7]. In these algorithms, the estimated speech spectrum is a function of the global spectral signal-to-noise ratio (SNR), the instantaneous spectral SNR, and the overall spectral shape of the speech signal. However, after studying simultaneous recordings made with two microphones, we befieve that the relationship between the two simultaneous features is nonlinear. We therefore propose to use a piecewise-nonlinear model to relate the two feature spaces. 1.1. Related Work on Feature Mapping Several algorithms in the literature have focused on experimentally training a mapping between the noisy features and the clean features [8-13]. The proposed algorithm differs from previous algorithms in several ways: • The MMSE estimate of the clean speech features in noise is trained experimentally rather than with a model as in [6, 7]. • Several frames are joined together similar to [13]. • The conditional PDF is based on a generic noisy feature not necessarily related to the feature that we are trying to estimate. For example, we could condition the estimate of the cepstral energy on the instantaneous spectral SNR vector. • Multidimensional least-squares filters are used for the mapping transformation. This exploits the correlation of the features over time and among components of the spectral ,features at the same time. • Linear transformations are combined together without hard decisions. • All delta parameters are computed after mapping the cepstrum and cepstral energy. • The mapping parameters are trained using stereo recordings with two different microphones. Once trained, the mapping parameters are fixed. • The algorithm can either map noisy speech features to clean features during training, or clean features to noisy features during recognition. 1.2. Related W o r k on Adapta t ion The algorithm used to map the incoming features into a more robust representation has some similarities to work on model adaptation. Some of the high-level differences between hidden Markov model (HMM) adaptation and the mapping algorithms proposed in this paper are: • The mapping algorithm works by primarily correcting shifts in the mean of the feature set that are correlated with observable information. Adapting HMM model parameters has certain degrees of freedom that the mapping algorithm does not havefor example the ability to change state variances, and mixture weights. • Two HMM states that have identical probability distributions and are not tied can have different distributions after adaptation. These distributions cannot be differentiated by mapping features. • The mapping algorithms described in this paper are able to incoiporate many pieces of information that have been tra-

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multiple Approaches to Robust Speech Recognition

2. ACOUSTICAL PRE-PROCESSING This paper compares several different approaches to robust speech We have found that two major factors degrading the performance of recognition. We review CMU’s ongoing research in the use of speech recognition systems using desktop microphones in normal acoustical pre-processing to achieve robust speech recognition, inoffice environments are additive noise and unkn...

متن کامل

Multi-microphone Correlation-based Processing for Robust Speech Recognition

In this paper we present a new method of signal processing for robust speech recognition using multiple microphones. The method, loosely based on the human binaural hearing system, consists of passing the speech signals detected by multiple microphones through bandpass filtering and nonlinear rectification operations, and then cross-correlating the outputs from each channel within each frequenc...

متن کامل

Adaptive-Filtering-Based Algorithm for Impulsive Noise Cancellation from ECG Signal

Suppression of noise and artifacts is a necessary step in biomedical data processing. Adaptive filtering is known as useful method to overcome this problem. Among various contaminants, there are some situations such as electrical activities of muscles contribute to impulsive noise. This paper deals with modeling real-life muscle noise with α-stable probability distribution and adaptive filterin...

متن کامل

Speaker localization and tracking with a microphone array on a mobile robot using von Mises distribution and particle filtering

This paper deals with the problem of localizing and tracking a moving speaker over the full range around the mobile robot. The problem is solved by taking advantage of the phase shift between signals received at spatially separated microphones. The proposed algorithm is based on estimating the time difference of arrival by maximizing the weighted cross-correlation function in order to determine...

متن کامل

Speech/noise separation using two microphones and a VQ model of speech signals

In this paper we address the problem of using two or more microphones to enhance speech corrupted by nonstationary noise, such as that of a competing speaker (cocktail party effect) at very low SNR, by means of linear filtering of two microphone signals. This work is a variant to the probabilistic Independent Component Analysis (ICA) method but using a more accurate probability distribution of ...

متن کامل

Robust Speech Recognition Based on Human Binaural Perception

In this paper we present a new method of signal processing for robust speech recognition using multiple microphones. The method, based on human binaural hearing, consists of passing the speech signals detected by multiple microphones through bandpass filtering and nonlinear rectification operations, and then cross-correlating the outputs from each channel within each frequency band. These opera...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994